MultiPhyl: a high-throughput phylogenomics webserver using distributed computing

نویسندگان

  • Thomas M. Keane
  • Thomas J. Naughton
  • James O. McInerney
چکیده

With the number of fully sequenced genomes increasing steadily, there is greater interest in performing large-scale phylogenomic analyses from large numbers of individual gene families. Maximum likelihood (ML) has been shown repeatedly to be one of the most accurate methods for phylogenetic construction. Recently, there have been a number of algorithmic improvements in maximum-likelihood-based tree search methods. However, it can still take a long time to analyse the evolutionary history of many gene families using a single computer. Distributed computing refers to a method of combining the computing power of multiple computers in order to perform some larger overall calculation. In this article, we present the first high-throughput implementation of a distributed phylogenetics platform, MultiPhyl, capable of using the idle computational resources of many heterogeneous non-dedicated machines to form a phylogenetics supercomputer. MultiPhyl allows a user to upload hundreds or thousands of amino acid or nucleotide alignments simultaneously and perform computationally intensive tasks such as model selection, tree searching and bootstrapping of each of the alignments using many desktop machines. The program implements a set of 88 amino acid models and 56 nucleotide maximum likelihood models and a variety of statistical methods for choosing between alternative models. A MultiPhyl webserver is available for public use at: http://www.cs.nuim.ie/distributed/multiphyl.php.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computational methods for statistical phylogenetic inference

This thesis comprises of three main parts addressing questions on the selection of maximum likelihood models in protein based phylogenetics, the application of distributed computing to phylogenetic analysis, and an analysis of the evolutionary relationships among the eukaryotes. In recent years, model-based approaches such as maximum likelihood have become the methods of choice for constructing...

متن کامل

Phylogenomic inference of protein molecular function: advances and challenges

MOTIVATION Protein families evolve a multiplicity of functions through gene duplication, speciation and other processes. As a number of studies have shown, standard methods of protein function prediction produce systematic errors on these data. Phylogenomic analysis--combining phylogenetic tree construction, integration of experimental data and differentiation of orthologs and paralogs--has bee...

متن کامل

The Mathematics of Phylogenomics

The grand challenges in biology today are being shaped by powerful high-throughput technologies that have revealed the genomes of many organisms, global expression patterns of genes, and detailed information about variation within populations. We are therefore able to ask, for the first time, fundamental questions about the evolution of genomes, the structure of genes and their regulation, and ...

متن کامل

NBC: the Naïve Bayes Classification tool webserver for taxonomic classification of metagenomic reads

MOTIVATION Datasets from high-throughput sequencing technologies have yielded a vast amount of data about organisms in environmental samples. Yet, it is still a challenge to assess the exact organism content in these samples because the task of taxonomic classification is too computationally complex to annotate all reads in a dataset. An easy-to-use webserver is needed to process these reads. W...

متن کامل

Running Head: HIGHLY MULTIPLEXED AMPLICON-BASED PHYLOGENOMICS 1 Title: HiMAP: robust Phylogenomics from Highly Multiplexed Amplicon sequencing

15 High-throughput sequencing has fundamentally changed how molecular phylogenetic datasets 16 are assembled, and phylogenomic datasets commonly contain 50-100-fold more loci than those 17 generated using traditional Sanger-based approaches. Here, we demonstrate a new approach for 18 building phylogenomic datasets using single tube, highly multiplexed amplicon sequencing, 19 which we name HiMAP...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 35  شماره 

صفحات  -

تاریخ انتشار 2007